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ABSTRACT 

Merging wireless traces is a fundamental step in measure- 
ment-based studies involving multiple packet sniffers. Ex- 
isting merging tools either require a wired infrastructure or 
are limited in their usability. We propose WiPal, an offline 
merging tool for IEEE 802.1 1 traces that has been designed 
to be efficient and simple to use. WiPal is flexible in the 
sense that it does not require any specific services, neither 
from monitors (like synchronization, access to a wired net- 
work, or embedding specific software) nor from its software 
environment (e.g., an SQL server). We present WiPal's op- 
eration and show how its features — notably, its modular 
design — improve both ease of use and efficiency. Experi- 
ments on real traces show that WiPal is an order of magni- 
tude faster than other tools providing the same features. To 
our knowledge, WiPal is the only offline trace merger that 
can be used by the research community in a straightforward 
fashion. 

1. INTRODUCTION 

Sniffing is a usual technique for monitoring wireless 
networks. It consists in spreading within some target 
area a number of monitors (or sniffers) that capture all 
wireless traffic they hear and produce traces consisting 
of MAC frame exchanges. Wireless sniffing is a funda- 
mental step in a number of network operations, includ- 
ing network diagnosis [l] , security enhancement [2] , and 
behavioral analysis of protocols [3j |4j [5j [6] . 

Wireless sniffing often involves a centralized process 
that is responsible for combining the traces [3j |4j |5j. 
The objective is to have a global view of the wireless 
activity from multiple local measurements. Individual 
sniffers can also compensate for their frame losses with 
data from other sniffers. Merging is however a difficult 
task; it requires precise synchronization among traces 
(up to a few microseconds) and bearing the unreliable 
nature of the medium (frame loss is unavoidable) . The 
literature has provided the community with a number 
of merging tool, but they either require a wired infras- 
tructure or are too specific to the experimentations con- 
ducted in the papers (see more details in Section |2|) 3 



In this paper we present WiPal, an IEEE 802.11 trace 
merging tool that focuses on ease-of-use, flexibility, and 
speed. By explaining WiPal's design choices and inter- 
nals, we intend to complete existing papers and give 
additional insights about the complex process of trace 
merging. WiPal has multiple characteristics that dis- 
tinguish it from the few other traces mergers: 

Offline tool. Being an offline tool enables WiPal to 
be independent of the monitors: one may use any 
software to acquire data. Most trace mergers ex- 
pect monitors to embed specific software [3] [7] . 

Independent of infrastructure. WiPal's algorithms 
do not expect features from traces that would re- 
quire monitors to access a network infrastructure 
(e.g., synchronization). Monitors just need to re- 
cord data in a compatible input format. 

Compliant with multiple formats. WiPal supports 
most of the existing input formats, whereas other 
trace mergers require a specific format. Some tools 
even require a custom dedicated format [3| . 

Hands-on tool. WiPal is usable in a straightforward 
fashion by just calling the adequate programs on 
trace files. Other mergers require more complex 
setups (e.g., a database server J4] or a network 
setup involving multiple servers |3j.) 

This paper provides an analysis that supports these 
choices (cf. Section]^]). First, the proposed synchroniza- 
tion mechanism exhibits better precision than existing 
algorithms. Second, WiPal is an order of magnitude 
faster than the other publicly available offline merger, 
Wit [4]. This analysis uses CRAWDAD's uw/sigcomm- 
2004 dataset |9], recorded during the SIGCOMM 2004 
conference Q It allows us to calibrate various parameters 
of WiPal, validate its operation, and show its efficiency. 
WiPal is however not designed for a specific dataset 
and works on any wireless traces using the appropriate 

1 To the extent of our knowledge, this is the only one dataset 
that is both publicly available and that provides enough data 
to perform merging operations. 
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A. The traces are not synchronized and miss some 
frames. 

B. One identifies some reference frames common to 
both traces. This information enables trace syn- 
chronization. 

C. One adjusts the frames' timestamps and synchro- 
nize Ti and T2. 

D. One can merge the traces. Duplicate frames are 
only accounted once. 



Figure 1: Merging two traces T\ and T%. 

input format (WiPal's test suite includes various syn- 
thetic traces with different formats) . We do believe that 
WiPal will be of great utility for the research commu- 
nity working on wireless network measurements. 

2. TRACE MERGING: OVERVIEW 

Wireless sniffing requires the use of multiple moni- 
tors for coverage and redundancy reasons. Coverage is 
concerned when the distance between the monitor and 
at least one of the transmitters to be sniffed is too large 
to ensure a minimum reception threshold. Redundancy 
is the consequence of the unreliability of the wireless 
medium. Even in good radio conditions monitors may 
miss successfully transmitted frames. After the collec- 
tion phase, traces must be combined into one. A merged 
trace holds all the frames recorded by the different mon- 
itors and gives a global view of the network traffic. 

The traditional approach to merging traces involves a 
synchronization step, which aligns frames according to 
their timestamps. This enables identifying all frames 
that are identical in traces so that they appear once 
and only once in the output trace (Cheng et al j3] refer 
to it as unification.) This process is illustrated in Fig.[l] 

Synchronization is difficult to obtain because, in or- 
der to be useful, it must be very precise. Imprecise 
frame timestamps may result in duplicate frames and 
incorrect ordering in the output trace. An invalid syn- 
chronization may also lead to distinct frames accounted 
for the same frame in the output trace. In order to avoid 
such undesirable effects one needs precision of less than 
106/is |5|. To the extent of our knowledge, no existing 



hardware supports synchronizing network cards' clocks 
with such a precision (note that we are interested in 
frame arrival times in the card, not in the operating 
system) . 

Therefore, all merging tools post-process traces to 
resynchronize them with the help of reference frames, 
which are frames that appear in multiple traces. One 
may readjust the traces' timing information using the 
timestamps of the reference frames (see Fig.[l]) Finding 
reference frames is however a hard task, since we must 
be sure a given reference frame is an occurrence of the 
same frame in every traces. That is, some frames that 
occur frequently (e.g., MAC acknowledgements) cannot 
be used as reference frames because their content does 
not vary enough. Therefore, only a subset of frames 
are used as reference frames, as explained later in this 
paper (cf. Section [4]). 

A few trace merging tools exist in the literature, but 
they do not focus on the same set of features as this pa- 
per. For instance, Jigsaw [3] is able to merge traces from 
hundreds of monitors, but requires monitors to access 
a network infrastructure. WisMon [7] is an online tool 
that has similar requirements. This paper however con- 
siders smaller-scale systems (dozens of monitors) but 
where no monitor can access a network infrastructure. 
Another system close to ours is Wit (8j |4| . Despite Wit 
provides valuable insights on how to develop a merging 
tool, it is difficult to use, modify, and extend in prac- 
tice (cf. authors' note in CRAWDAD [8]). Thus our 
motivation to propose a new trace merger. Note that 
this paper only refers to Wit's merging process (as Wit 
has other features like, e.g., a module to infer missing 
packets) . 

3. WiPal'S BASICS 

WiPal has been designed according to the following 
constraints: 

No wired connectivity. The sniffers must be able to 
work in environments where no wired connectiv- 
ity is provided. This enables performing measure- 
ments when it is difficult to have all sniffers access 
a shared network infrastructure (e.g., in some con- 
ference venues, or when studying interferences be- 
tween two wireless networks belonging to distinct 
entities) . 

Simplicity to the end-user. We believe simplicity is 
the key to re-usability. Users are not expected to 
install and set up complex systems (e.g., a data- 
base backend) in order to use WiPal. 

Clean design. WiPal exhibits a modular design. De- 
velopers can easily adapt part of the trace merger 
(e.g., the reference frames identification process, 
the synchronization, or merging algorithm.) 
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Figure 2: WiPal's overall structure. 

For these reasons, we opted for an offline trace merger 
that does not require that traces be synchronized a pri- 
ori. Concretely, the sniffers only have to record their 
measurements on a local storage device, using the wide- 
ly used PCAP (Packet CAPture) file format. WiPal 
comes as a set of binaries to manipulate wireless traces, 
including the merging tool presented in this paper, ft 
works directly on PCAP files both as input and out- 
put. WiPal is composed of roughly 10k lines of C++ 
and makes heavy usage of modern generic and static 
programming techniques. WiPal is downloadable from 
http : / /wipal . Iip6 . f r} 

4. WiPal'S DETAILED OPERATION 

Fig. [2] depicts WiPal's structure. Each box represents 
a distinct module and arrows show WiPal's data flow. 
WiPal takes two wireless traces as input and produces 
a single merged tracej^] In the following, we explain in 
detail the functioning of each one of the modules. 

4.1 Identifying reference frames 

This section explains the process of extracting refer- 
ence frames. This operation involves two steps: extrac- 
tion of unique frames and intersection of unique frames 
(see Fig. |) 

Let us first define what a unique frame means. A 
frame is said to be unique when it appears "in the air" 



2 In order to merge more than two traces, it suffices to ex- 
ecute the merging tool as many times as required (two by 
two). 



once and only once for the whole duration of the mea- 
surement. A frame that is unique within each trace but 
that actually appeared twice on the wireless medium 
should not be considered as unique. 

The process of extracting unique frames finds candi- 
dates to become reference frames. The process of inter- 
secting unique frames identifies then identical unique 
frames from both traces to become reference frames. 

4.2 Extraction of unique frames 

WiPal consider every beacon frame and non-retrans- 
mitted probe response as a unique frame. These are 
management frames that access points send on a regu- 
lar basis (e.g., every 100 ms for beacon frames). The 
uniqueness of these frames is due to the 64-bit times- 
tamps they embed (these timestamps are not related to 
the actual timestamps used for synchronization). 

In practice, the extraction process does not load full 
frames into memory. It uses 16-byte hashes instead, 
which are stored in memory and used for comparisons. 
Limiting the size of stored information is an important 
aspect since, as we will see later, WiPal's intersection 
process performs a lot of comparisons and needs to store 
many unique frames in memory. Tests with CRAW- 
DAD 's uw/sigcomm2004 dataset [9] have shown that 
this technique is practical. Concretely, WiPal needs 
less than 600 MB to load 7,700,000 unique frames. 

There are some rare cases where the assumption that 
beacons and probe responses are unique does not hold. 
The uw/sigcomm2004 dataset has a total number of 
50,375,921 unique frames (about 14% of 364,081,644 
frames). Among those frames, we detected 5 collisions 
(distinct unique frames sharing identical hashes.) Wi- 
Pal's intersection process includes a filtering mechanism 
to detect and filter such collisions out. 

4.3 Intersection 

The intersection process intersects the sets of unique 
frames from both input traces. There are multiple algo- 
rithms to perform such a task. Based on Cheng et al. [3j, 
a solution is to "bootstrap" the system by finding the 
first unique frame common to both traces and then use 
this reference frame as a basis for the synchronization 
mechanism, as shown in Algorithm [TJ One may also 
use subsequent reference frames to update synchroniza- 
tion. This algorithm is practical because the inner loop 
only searches a very limited subset of I2 ■ It has several 
drawbacks though: (i) the performance of the algorithm 
strongly depends on the precision of the synchroniza- 
tion process; (ii) finding the first reference frame is still 
an issue; (iii) this algorithm couples intersection with 
synchronization, which is undesirable with respect to 
modularity; and (iv) there is a possibility that some 
frames are read multiple times from 12- More specifi- 
cally, access to I2 is not sequential. 
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Algorithm 1 Intersection using synchronization. 

Input: two lists of unique frames Ii and I2. 'S 0,9 

Output: a list of reference frames. 1- 08 

fc 0.7 

53 

8 <— synchronization precision u 0.6 

for all it 1 G h do 0.5 

t ui «— tti's time of arrival oi q.4 

for all U2 G I2 between t ui — S and t ui + 5 do < q 3 
if U2 is an occurrence of m then 

Append (1*1,112) to output, 
end if 
end for 

end for u>:„ o. 



Algorithm 2 WiPal's intersection algorithm. 

Input: two lists of unique frames I\ and I2. 
Output: a list of reference frames. 

h «— > Implement /i with a hash table, 

for all m G ii do 

Insert ui into h. 
end for 

for all U2 G I2 do 

if h contains an occurrence ui of U2 then 

Append (111,112) to output, 
end if 
end for 



WiPal includes an algorithm that is much simpler to 
implement and that avoids the drawbacks of the above- 
mentioned solution. The main characteristics of the 
proposed algorithm (detailed in Algorithm [2| are: (i) 
it does not require a bootstrapping phase; (ii) it does 
not depend on any kind of synchronization; and (iii) It 
sequentially reads each frame only once from I\ and Ii- 

The algorithm starts by loading all unique frames of 
the first trace into memory. This precludes using it 
as an online tool. Note that loading all unique frames 
from a trace into memory may hog resources; this jus- 
tifies the importance of having small identifiers for the 
unique frames. These constraints are however negligible 
compared to those of Algorithm [I] To support our argu- 
ment, let us show an example using the uw/sigcomm- 
2004 dataset. The biggest traces are those from snif- 
fers mojave and sonoran on channel 11 (roughly 19 GB 
each.) Extracting these traces' unique frames and in- 
tersecting them using WiPal needs 575 MB of memory. 
Therefore, memory aggressiveness is not a concern in 
Algorithm [2] 

Another advantage of Algorithm [2] is its ability to de- 
tect collisions of unique frames within the first trace. 
Collisions are detected by duplicate elements in h. Wi- 
Pal detects such cases, memorizes collisions, and filter 
them out of the hash table before starting the algo- 
rithm's second loop. Of course, collisions in the second 
trace remain undetected. Even if WiPal detected them, 
there would still be the possibility that a collision spans 
across both traces (i.e., each trace contains one occur- 
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. Average synchronization error w.r.t. 
linear regression window size. 

rence of a colliding unique frame). Such cases lead to 
producing invalid reference frames. To detect them, Wi- 
Pal looks at possible anomalies w.r.t. the interarrival 
times between unique frames. In practice, invalid ref- 
erences are rare: only three occurrences when merging 
uw/sigcomm2004's channel 11 (a 73 GB input which 
produces a 22 GB output). 

4.4 Synchronization 

Synchronizing two traces means mapping trace one's 
timestamps to values compatible with trace two's. Wi- 
Pal computes such a mapping with an affine function 
<2 = a t\ + b. It estimates a and b with the help of ref- 
erence frames as the process runs. 

WiPal's synchronization process operates on windows 
of w + 1 reference frames (finding an optimal value of 
w is discussed below). For each reference frame Ri, 
the process performs a linear regression using refer- 
ence frames Ri-\w/2\ >■■■■, Ri+\ w /2\ ■ At the beginning 
and at the end of the trace, we use R\, . . . , R w and 
Rn-wi ■ ■ ■ j Rn (N is the number of reference frames.) 
The result gives a and b for all frames between Ri and 
R%+\- 

We performed a number of experiments that revealed 
that the optimal value for w is 2 (i.e., WiPal performs 
linear regressions on 3-frame windows). Fig.|3]shows the 
results of performing two merge operations with varying 
window sizes. The merges concern channel 11 of the Sa- 
hara - chihuahuan and kalahari - mojave sniffers from 
uw/sigcomm2004. The average synchronization error 
is computed as follows. Consider only the subset S of 
frames that are shared by both the first and second trace 
T\ and Ti. For a given frame /, let tf t i be the arrival 
time of / inside Ii (after clock synchronization) and 
tf t 2 be the arrival time of / inside T^. The average syn- 
chronization error is given by rL YlfeS l^/> 2 — ^ s 
previously underlined, w = 2 leads to the minimum av- 
erage synchronization error. Note that techniques that 
use w — 1 (i.e., that performs linear interpolations on 
couples of reference frames) would lead to the worst syn- 
chronization error. Furthermore, merging traces with 
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Number of shared frames 
w = 1 w > 1 


Sahara - chihuahuan 
kalahari - mojave 


32,312,812 
840,143 


32,320,267 
840,227 



Algorithm 3 WiPal's merging algorithm. 

Input: two synchronized traces T\ and T2. 
Output: the merge of Ti and T2. 



Table 1: Number of frames found to be shared 
by both input traces when merging Sahara - chi- 
huahuan and kalahari - mojave with w = 1 and w > 1 
(channel 11). 



w = 1 misses some shared frames. Table [T] shows the 
number of frames that are identified as duplicates in the 
input traces. Whereas using w > 1 always gives identi- 
cal results, using w — 1 leads to some missed duplicates 
(7,455 for Sahara - chihuahuan and 84 for kalahari - 
mojave). Although this is a small number compared 
to the total number of frames in the output traces, it 
indicates that synchronizing traces using linear interpo- 
lation (as Wit [8] does) may lead to incorrect results. 
Unfortunately, it is difficult to know whether some du- 
plicates were missed when w > 1 (we do not know which 
frames to expect as duplicates). 

4.5 Merging 

We now present how WiPal performs the final step, 
namely the merging process itself. Its role is to copy 
frames from synchronized traces to the output trace. Of 
course, it must order its output correctly while avoiding 
duplicate frames. 

Algorithm [3] details WiPal's merging algorithm. For 
the sake of illustration, we present here a simplified ver- 
sion that assumes that only one frame is emitted at a 
given time inside the monitoring area. It simultaneously 
iterates on both inputs, where each iteration adds the 



earliest input frame to the output (lines [15] and 16 ) 



Duplicate frames are the ones that have identical con- 



tents and that are spaced less than 106/is (line 11 ) The 
rationale for this value is that 106/xs is half of the min- 
imum gap between two valid IEEE 802.11 frames [5]. 
Therefore, the appearance of identical frames during 
such an interval is in fact a unique occurrence of the 
same frame. 

5. EVALUATION 

This section provides an evaluation of WiPal using 
CRAWDAD's uw/sigcomm2004 dataset [9). We inves- 
tigate both the correctness and the efficiency of WiPal. 
We merge all traces sniffed from channel 11 and then 
use some heuristics to evaluate the quality of the result. 
We also analyze WiPal's speed. 

Traces from five sniffers compose the uw/sigcomm- 
2004 dataset: chihuahuan, kalahari, mojave, Sahara, 
and sonoran. Fig. [4] shows the merging sequence we 
used to merge all traces. The reason why kalahari and 
mojave share so few frames is that kalahari is an order 
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procedure Advance(/: frame, T: trace) 

Append / to output; / <— T's next frame (or nil) 
end procedure 

/1 «— Ti's first frame; f% <— TVs first frame 

while /1 7^ nil or f 2 7^ nil do 

if fi = nil then Advance(/ 2 , T 2 ) 
else if fa = nil then Advance(/i, Ti) 
else 

t f 1 <— /1 's time of arrival 

tf 2 <— /2's time of arrival 

if /1 = f'2 and \tf 1 — tf 2 \ < 106/is then 
Append either fi or f 2 to output. 
/1 <— Ti's next frame (or nil) 
f 2 <— T2's next frame (or nil) 

else if tf t < tf 2 then Advance(/i, Ti) 

else Advance(/ 2 , T 2 ) 

end if 
end if 
end while 



of magnitude smaller than mojave. 

5.1 Correctness 

Checking the correctness of the output is difficult. 
Being able to test whether traces are correctly merged 
or not would be equivalent to knowing exactly in ad- 
vance what the merge should look like. Unfortunately, 
there is no reference output against which we could com- 
pare. Thus, we propose several heuristics to check if 
WiPal introduces or not inconsistencies in its outputs. 
We also check WiPal's correctness with a test-suite of 
synthetic traces for which we know exactly what to ex- 
pect as output. 

A broken merging process could lead to several incon- 
sistencies in the output traces. Regarding the uw/sig- 
comm2004 dataset, we investigate in particular two of 
those inconsistencies: duplicate unique frames and du- 
plicate data frames. 

Duplicate unique frames. As seen previously, every 
unique frame should only occur once in the traces 
(including merged traces). Yet, it is difficult to 



avoid collisions in practice (see Section 4.2). Thus 
one should not consider all collisions as inconsis- 
tencies. When merging uw/sigcomm2004, the final 
trace has 5 collisions. We manually verified that 
they are not inconsistencies introduced by WiPal's 
merging process. 

Duplicate data frames. We search traces on a per- 
sender basis for successive duplicate data frames 
(only considering non-retransmitted frames). Such 
cases should not occur in theory - without retrans- 
missions sequence numbers should at least vary. 
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Figure 4: Summary of uw/sigcomm2004's merg- 
ing process, channel 11. Percentages indicate 
the number of frames that are shared by par- 
ent traces. Bottom figures indicate the average 
synchronization error. 

Surprisingly, traces from uw/sigcomm2004 contain 
20,303 such anomalies. We have no explanations 
why the dataset exhibits those phenomena. We 
checked however that the merged trace does not 
have more duplicates than the original traces. 

5.2 Efficiency 

Merging all the traces (73 GB) takes about 2 hours 
and 20 minutes (real time) on a 3 GHz processor with 
2 GB RAM. We balance merge operations on two hard 
drives, whose average throughput during computations 
are about 60 MB/s and 30 MB/s. The average CPU 
usage is 75%, which means one could perform faster 
with faster hard drives (about 1 hour and 40 minutes). 

Comparing WiPal with online trace mergers does not 
make much sense: their mode of operation is different, 
and these also have different requirements (e.g., wired 
connectivity and loose synchronization.) The compar- 
ison would be unfair. We can however compare Wi- 
Pal with Wit 18], another offline merger. Wit works on 
top of a database backend, which means that trace files 
need to be imported into a database before any further 
operation can begin (e.g., merging or inferring missing 
packets). Using the same machine as before, import- 
ing channel 11 of uw/sigcomm2004 into Wit's database 
takes around 33 hours (user time). This means that, 
before Wit begins its merge operations, WiPal can per- 
form at least 14 runs of a full merge with the same 
data. WiPal allows then tremendous speed improve- 
ments. One of the reasons for such a difference is WiPal 
uses high performance C++ code while Wit is just a set 
of Perl scripts using SQL to interact with a database. 

6. CONCLUSION 

This paper introduced the WiPal trace merger. As 
an offline merger, WiPal does not require sniffers to 



be synchronized nor to have access to a wired infras- 
tructure. WiPal provides several improvements over 
existing equivalent software: (i) it comes as a simple 
program able to manipulate trace files directly, instead 
of requiring a more complex software setup, (ii) its syn- 
chronization algorithm offer better precision than the 
existing algorithms; and (iii) it has a clean modular de- 
sign. Furthermore, we also showed WiPal is an order 
of magnitude faster than Wit [8 , the other available 
offline merger. 

We have several plans for the future of WiPal. First, 
we are currently extending it to include other features 
(besides merging). As a flavor of future features of 
WiPal, it will perform traffic statistics on IEEE 802.11 
traces. We will also make better use of WiPal's mod- 
ularity and test other algorithms for the various stages 
of the merging operation. 
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